Assembling a parallel corpus from RSS news feeds

نویسنده

  • John Fry
چکیده

We describe our use of RSS news feeds to quickly assemble a parallel English-Japanese corpus. Our method is simpler than other web mining approaches, and it produces a parallel corpus whose quality, quantity, and rate of growth are stable and predictable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated System for Improving RSS Feeds Data Quality

Nowadays, the majority of RSS feeds provide incomplete information about their news items. The lack of information leads to engagement loss in users. We present a new automated system for improving the RSS feeds’ data quality. RSS feeds provide a list of the latest news items ordered by date. Therefore, it makes it easy for a web crawler to precisely locate the item and extract its raw content....

متن کامل

Synthesizing correlated RSS news articles based on a fuzzy equivalence relation

Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particula...

متن کامل

Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information

Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. In order to better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds in order to locate articles pertaining ...

متن کامل

Matt Fuller

Traditionally users subscribe to RSS feeds of interest using an RSS feed reader. The RSS feed reader periodically polls the subscribed feeds for updates or items to be displayed to the user. Many RSS feeds usually pertain to a single news source or blog. Others may aggregate various feeds usually on some topic and produce a single RSS feed. Middleware publishsubscribe systems allow users to sub...

متن کامل

(X)querying RSS/Atom Feeds Extracted from News Web Sites: a Cocoon-based Portal

The Web is fastly becoming the predominant source for news and information for many people. In the past few years, a new delivery system has emerged in the form of RSS feeds. Such feeds normally provide a brief of a larger news posted on the Web. RSS feeds, collected to form “channels” according to some thematic criteria, can be accessed using Web browsers or specialized software called “news a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005